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Abstract 

This paper studies a Non-convex State-dependent Linear Quadratic Regulator (NSLQR) problem, in which the control 
penalty weighting matrix R in the performance index is state-dependent. A necessary and sufficient condition for the 
optimal solution is established with a rigorous proof by Euler-Lagrange Equation. It is found that the optimal solution of the 
NSLQR problem can be obtained by solving a Pseudo-Differential-Riccati-Equation (PDRE) simultaneously with the closed- 
loop system equation. A Comparison Theorem for the PDRE is given to facilitate solution methods for the PDRE. A linear 
time-variant system is employed as an example in simulation to verify the proposed optimal solution. As a non-trivial 
application, a goal pursuit process in psychology is modeled as a NSLQR problem and two typical goal pursuit behaviors 
found in human and animals are reproduced using different control weighting R(x). It is found that these two behaviors 
save control energy and cause less stress over Conventional Control Behavior typified by the LQR control with a constant 
control weighting R, in situations where only the goal discrepancy at the terminal time is of concern, such as in IVlarathon 
races and target hitting missions. 
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introduction 

1.1 Problem Definition 

In this paper, we seek an optimal control law u = k(x,t), for 
which the performance index 



J{x{t),u(t)) -- 



'f 

'0 

1 
2 



L(x(t),u{t),t)dt + (l)(x(tf),tf) 



(x^(02(040 + u^(t)R(x(t))u(t))dt (1) 



+ -xT(tf)S(tf)x(tf) 

is minimized along the associated closed-loop system trajectory of 
the Linear Time-variant (LTV) system 



x(t) = A{t)x(t) + B{t)u{t) 



x(to) = XQ 



(2) 



where u{t)eW" is the control input, x(/)elR" is the system state, ?o 
is the starting time, tf is the terminal time and Xq is the initial value 
of x(t) at time t^. The coefficients A{t\Q{t),S{tf)eW" , 
B(t)eW'"", R(x{t))eU'"''"' . To simplify notation, the dependence 
of variables on t is omitted when no confusion will be introduced 
in the rest of the paper. It is assumed that A, B, Q are continuous 

dR 

in t, R(x) is difTerentiable with respect to x, and is bounded. 

ox 



The coefficients Q and S are positive semi-definite symmetric 
matrices for all te[tQ,tf\, and R{x) is a positive definite 
symmetric matrix for all x(t),te[t(,,tf]. Additional conditions on 
R{x) wiU be imposed in order to obtain the sufficiency for 
optimality. 

It is noted that when the state-dependent matrix R{x) in Eq (1) 
is replaced by a time-dependent matrix R{t), the performance 
index J is quadratic and convex in both x and u, and Eq (1) and (2) 
constitute the standard Linear Quadratic Regulator (LQR) 
problem. The classical LQR theory provides a mature way to 
find an optimal control law for such a convex quadratic 
performance index. However, the state-dependent coefficient 
R(x) in Eq (1) renders the performance index in the problem no 
longer convex in both x and u, which makes the LQR theory 
inapplicable here. However, the formalism of the LQR theory is 
still useful. Therefore, we denote the problem defined above as a 
Non-convex State-dependent LQR (NSLQR) problem. The 
associated Riccati Equation of the NSLQR problem is named as 
Pseudo-Differential-Riccati-Equation (PDRE). In this paper, a 
necessary and sufficient condition for the optimal solution of the 
defined NSLQR problem is presented, with an additional 
condition on R(x), and the optimality of the solution is proven 
with Euler-Lagrange Equation. The PDRE is also studied to 
obtain the optimal solution and a theorem is given to estimate the 
solution of the PDRE. 
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1.2 Related Work 

A similar problem has been studied in the context of State- 
Dependent Riccati E(|uati()n (SDRE) control strategy since mid- 
9()'s. The strateg}', proposed by Pearson[l] and expanded by 
Wernli and Gook[2], was independendy studied by Mracek and 
Cloutier[3] in 1998. Friedland[4], Salamci and Gokbilenp], 
Cimen et.al[6,7] also contributed to the existence of solutions as 
well as the properties of optimality and stability. In the SDRE 
strategy, a nonlinear system is "factored" into the product of the 
state vector and the state-dependent matrix-valued function in the 
form 

x(t) = f (x(0) + gix{t))u(t) = A*{x{t))x(t) + B*{x(t))u{t) 

(3) 

x{to) = Xo 

which is a linear structure having state-dependent coefiicients. 
Borrowing the LQR theory, the SDRE strategy postulates an 
approximately optimal feedback control law as 

uix)=-R*-\x)B*'^ix)P*ix)xit) (4) 
for the performance index 

1 f°° 

7* = - (jc^COe* W40 + u^it)R*ix)u(t))dt (5) 

where P'ix) is the solution of an algebraic Riccati Equation (RE) 
as 

A*'^(x)P*{x) + P*(x)A*(x) 

(6) 

- P* {x)B* (x)R*-'^ ix)B*^(x)P* {x) + Q* (x) = 0 

This strategy has been apphed in a wide variety of nonlinear 
control applications, such as autopilot design [8,9], integrated 
guidance and control design [10], etc. 

However, only the necessary condition for optimality has been 
studied in the SDRE control strategy, and it cannot always be 
established. So the optimality of the control law in the SDRE 
control strategy cannot be guaranteed. Since a simplified algebraic 
RE is employed to obtain P* instead of a differential RE, the 
application of the SDRE control strategy is limited to Slowly 
Time-Varying and Weakly State-dependent Systems. Moreover, 
even though the SDRE strategy has been used in many 
applications, in most cases, the coefficients _R* and Q* in the 
performance index /* are constant instead of state-dependent, as 
shown in its formulation [7]. 

The NSLQR problem defined in this paper focuses on the state- 
dependent R(x) and the time-dependent Q{t) in the performance 
index and starts with the LTV systems. The optimality of the 
solution is validated by a rigorous proof with Euler-Lagrange 
Equation. The solution can be obtained by solving a PDRE 
associated with the problem. The work is a special case of the 
SDRE control strategy, but with rigorous mathematical proof It 
could be considered as a theoretical support for the SDRE control 
strategy. 

On another aspect, the solution of the optimal LTV problem is 
usually obtained through numerical approximation approaches, 
which can be roughly classified into ofiline and online methods. 
The ofiline method usually pre-computes solutions and stores 
them for fast online look-up [11,12]. Since the computation grows 
exponentially with the size of the control problem, ofiline methods 



are normally used in small- and medium-size applications. The 
most prominent online methods are active set[13] and interior 
point method[14]. The method of active set performs well in large- 
size cases even though its convergence rate is unknown. For the 
interior point method, the reported lower iteration number is 
larger than the practically observable number. In Ref. [15] and 
[16], a fast gradient method is introduced to help calculating the 
lower iteration bound for a quadratic LTV optimal problem with 
input constraints. Though the work listed above is mainly about 
the optimal problem with a time-dependent R, the formalism is 
still applicable when developing the numerical solution for the 
defined NSLQ_R problem. 

1.3 Application Background 

The NSLQR problem discussed in this paper can be applied to 
model a psychological goal pursuit process, as a non-trivial 
example. 

Psychologists observe that there are two different behaviors 
when intelligent creatures pursue a goal. One is the Goal-Gradient 
Behavior (GGB) [17-20], in which the control effort to reach a 
goal increases monotonically with the proximity to the desired end 
state, such as the predator stalking behavior and the deadline 
beating behavior. Fig. 1 (a) and (b) give the normalized goal 
discrepancy and control effort of the GGB. As it is shown, with a 
monotonically increasing control energy, as the goal is approached 
the discrepancy reaches zero faster at the end of the process than it 
does at the beginning. The other is the Stuck-in-the-Middle 
Behavior (SMB) [21], in which the control effort to reach a goal is 
high at the beginning of the goal pursuit and when the desired end 
state is in sight, but it is maintained at a low level in between, such 
as what athletes do in Marathon. Part (c) and (d) in Fig. 1 show 
typically the SMB where the goal discrepancy decreases faster at 
the two ends than it is in the middle of the goal pursuit process and 
the control effort is maintained at a low level in the middle. 

Both the GGB and the SMB are different from the Conven- 
tional Control Behavior (CCB) found in an engineering control 
system, as shown in part (e) and (fl in Fig. 1. For the CCB, the 
control effort is proportional to the goal discrepancy, so the effort 
decreases with proximity to the desired end. The purpose of this 
paper is to study which one pf these three behaviors is the best. 
Some computational models of the GGB have been proposed 
based on psychological interpretation [22,23]. In this paper, a 
single-task goal pursuit process is modeled as a NSLQR problem 
and the three behaviors are reproduced for comparison, facilitat- 
ing "a deeper understanding of mathematical characterizations of 
principles of adaptive intelligence" [24] instead of psychological 
interpretation. 

In the sequel. Section 2 presents the necessary and sufficient 
condition for the optimality of the solution to the NSLQR 
problem. Section 3 analyzes the solution of a PDRE involved in 
the NSLQR problem and presents a Comparison Theorem. 
Section 4 verifies the feasibiUty of the NSLQR theory with a LTV 
system and applies the NSLQR to model a goal pursuit process. 
The numerical simulation results are presented to demonstrate 
that the GGB and SMB save control energy and cause less stress 
over the CCB in some applications. Conclusion and Future Work 
are presented in Section 5. 

Analysis of the Optimality of the Solution 

In this section, the main result of this paper is presented: the 
necessary and sufficient condition of the optimality of the solution 
to the NSLQR problem defined in Eq (1) and (2). Before that, an 
Optimality Lemma is introduced first. The Optimality Lemma 
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(a) Goal Gradient Behavior (c) Stucl< in the IVliddle Behavior (e) Conventional Control Behavior 
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Figure 1. Three Different BeKiaviors: GGB, SiVIB and CCS. 
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discusses a more general optimal problem, compared with the 
NSLQR problem. To differ from the performance index / in the 
NSLQR, we denote the performance index in the Lemma as /' 
with an associated general L'{x{t),u{t),t). The associated aug- 
mented performance index is defined as /' in Eq (10), to 
distinguish it from the J in the NSLQR problem. In this paper, 
the header indicates the associated augmented performance 
index, as presented on Page 379 in Ref [25]. 

Lemma 1 For the problem of finding an optimal control law u(t),for 
which the performance index 



J\x{,i),u(t))-- 



L'ix(t),u(t)J)dt + (l>'ix(tf)Jf) (7) 



is minimized along the associated closed-loop system trajectory x{t) of the LTV 
system (2), with a fixed starting time to and terminal time tf, to simplify 
notation, define 



z(0 = [x^(0,"^(0]^ 



(8) 



f'{z(t),z{t),t) = h'{x{t),u{,t),x(t),u(t),t) 
= L'(x(t),u(t),t) + l'(tf(A(t)x(t) + B{t)u{t) - x{t)) 

and an augmented performance index as 



J'{x{t),u{t))-- 



(9) 



nz{t),z{t),t)dt+^'(x(tf\tf) (10) 



where ).'(t) is the Euler-Lagrange multiplier, with the boundary condition as 



jr(z,Z,t)Sz\^^^o,^,^- 



8z 



(z,z,t)Sz\. 



fll) 



Then the point z° = [{x")^ jiu")^]^ that satisfies Euler-Lagrange Equation 



(12) 
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is the optimal solution if the augmented performance index J'{x,u) is strictly 
convex in x, uniformly in u; and strictly convex in u, uniformly in x. 

Here, the superscript "o" indicates the optimal solution. 

Proof 1 It can be proven that J'{x,u) is equivalent to J'{x,u)in that they 

have the same minimizing function, if it exits \25\. With Euler-Lagrange 
Equation (12), the variation of J'(x,u) at z" can be written as 



dJ' 



['/ df df 
'{x\u'')= J {j^5z+^8z%o dt+ ■j^{x,f,tf)8xtf 



dz 



dx,f' 



1 



'f.ddf' . 8/' . J 



+ ex;/'"f''^^^'"^ 
df 

= -^(z.Z.O&lz=z'',( = (, 



(13) 



'/ 5z 



(z,z,t)Szl 



To make the point z" be the optimal solution, the variation dJ'{x,u) is 
supposed to be 0 at z" , which leads to Eq (11). However, the point z° 
satisfying Eq (11) and (12) can either an extreme point or a saddle point of the 
J'{x,u). Mow we prove that, under the convexity constraint stated in Lemma 
1, the point zf is a minimum point by contradiction. 

With J'(x,u) being strictly convex in x, unformly in u, it follows that 
x°{t) satisfying Eq (12) is a minimum point of J'{x,u) with respect to x, 
uniformly in u. Then we have 

J'ix,u)>J'(x'',u) (14) 
for every u(t),te[tQ,tf\ and every x{t)^x''{t). Similarly, we have 



J'(x,u)>J'(x,u'') 



(15) 



for every x(/),te[fo,if/-] and every u{t)^tf(t). 

Assume that the point z° is a maximum point of J'{x,u). Then since 
J'{x,u) is continuous, there exists a x{t)^x°(t), such that 



J'{x,if)<J'{xf,i/') 



(16) 



which contradicts Eq (14). Thus z° cannot be a maximum point of J'(x,u). 

Now assume that zf is a saddle point of J'(x,u). Then by continuity of 
J'(x,u), there must be a z' = [(x')"^,(m')"'^]"^ t^z" and a 
z2 = [{x^f,{u^f] ^ # z" such that 



The equality holds if and only if x^ =x° and m' = if , which contradicts the 
defination of z' . The inequality contradicts Eq (14). Thus the point z" cannot 
be a saddle point of J' (x,u) either. Then the point z" must be a minimum point 
ofJ'{x,ii). So the solution {x",if) minimizes the performance index J' in (7). 

From the proof abo\'e, it ran be said that the classical LQR is a 
special case of Lemma 1. In the classical LQR theory, the 
sufficiency of the optimality of the solution obtained from Euler- 
Lagrange Equation is guaranteed by the convexity of the 
augmented performance index / in its arguments (x,u) [25]. 
However, in the NSLQR, / is no longer convex in {x,u) because 
of the state-dependent R{x). The theorem below shows that the 
solution of Euler-Lagrange Equation is still optimal for the 
NSLQR problem with a constraint on R{x). 

Theorem 1 Under the convexity constraint that the fimction 
l(x,u) = u^ R(x)u is a strictly convex furution in x, uniformly in u, the 
state feedback control law 

u''{t) = k{x!' ,t)= -R-\x'')B^{t)P(t)x!'{i) (19) 

for the NSLQR problem defined in Eq (1) and (2), and the associated closed- 
bop system trajectory xf{t) as 

x<'{t) = [A{i) - B{t)R- ' {x'>)B'^{t)P{t)]x''{i) x°{to) = xq (20) 

minimizes the performance index (1) rf and only if the nxn matrix P{f) 
satisfies the PDRE as 

- P{t)x\t) = A^{t)P(t)x''(t) + P{t)A(t)x''(t) 

~Pit)B{t)R-\x°)B^{t)P(t)x°it) (21) 
+ Q(t)xf{t) + M(x",i/') 



with 



where the column vector 



P{tf) = S{tf) 



(22) 



M{x\u") = M(x,«)|(,„,^) = ^ ^^""^^ (23) 



Here, the explicit x(t) on both sides of the equation can be 
eliminated for some sharpened R{x). One example is discussed in 
Section 3. 

This theorem provides an optimal solution, which is similar to 
that of the classical LQR theory, for the NSLQR problem. 
However, the PDRE (21) is with an additional term " M {x° ,1/')'" , 
compared with the standard Riccati Equation in the LQR theory. 
This term comes from the derivative of the state-dependent R{x) 
with respect to x in the Euler-Lagrange Equation, as detailed in 
Proof 2. The()r(^m 1 can be proven with Lemma 1 as follows. 

Proof 2 To simplify notation, define 



,m') < J{x'',u'')<J{x^,u^) 
Then with Eq (15), we have 

J{x\u°)<J(x\u^)<J(x'',u'') 



(17) 



(18) 



/ {z{t),z{t),t) = h{x{t),u{t),x{t),u{t),t) 

= L{x{t),u(t),t) + X(tf{A{t)x(t) 
-\-B{t)u(t)-x{t)) 



(24) 
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where 



z(0 = [x^(0,"''(01^ 



(25) 



and l{t) is the Eukr-Lagrange multiplier. The admissible directions of z{t) 
are denoted as 



v(0 = [r«,M 



T{,\ ,,Tt,\\T 



(26) 



with t^(?o) = 0, <^(?/), ^i(to), and n{tf) being free, since the initial value of 
x(t) is fixed. The augmented peformance index is 



J{x{t),u{t))= Y 



f{z{t),i{t),t)dt+4>(x{tf),tf) (27) 



Similarly, J{x,lt) is equivalent to J{x,u) in that they have the same 
minimizing function, if it exits [2.5]. .Now we need to prove that x"{t\ u°{t) 
defined in Eq (19) and (20) minimizes the augmented peformance index (27) 
if and only if P{t) satisfies the PDRE (21) and P(tf) = S(tf). 

We start with the necessary condition. From Eukr-Lagrange Equation, it is 
known that for a point z° = [{x^Y^}^] be an optimal solution, the 
necessary condition is 



|(z,i,.)U. = ^(|fei,0)U. (28) 
with a boundary condition, as discussed later. For a state-dependent R(x) 



r 



d<l>{x{tf),tf) I „ . 



i[x'^{t)Qit) + X^{t)Ait) + M'^ix,t), 



u'^{t)R{x) + r(t)B(t)\ 



flit) 



+ [-r(0,o] 



(35) 



+x''{tf)s(tfntf) 



= ^([-1^(0,0] 



iiit) 



+ [-A^(0,0] 



m 

MO. 



)dt+x''{tf)S{tf)^{tf) 



+x^(f/)S(i'/)«//) 

={x\tf)s{tf)-x\tf)mf) 



5/^ . M r^h . . . , dh . . . 



5/^ . M ,Sh . . . ^ dfi . . . 

^(Z'Z,0|2=z<> = [^ix,U,X,U,t), — ix,U,X,U,t)\\^^o^^', 



(30) 



[-A^(/),0]|(,„,„„) 



To simplify notation, the column vector 



1 d(u^R(x)n) . 



is denoted as M(x,u). 



2 8x 

Substituting the two equations above into Eq (28), we have 

{x"f(i)Q(t)+X'^{t)A{t) + M^{:,^,u'>)= -A^(/) (31) 



{u''Y{t)R{x'') + X'^{t)B{i) = Q (32) 
which Imds to a control law as 



u''(t)=-R-\x")B^(t)k(t) 



(33) 



with X{t) sai 



- k{t) = A{tyX{t) + Q(t)^{i) + Mix^u") (34) 
The variation of J{x,u) at the point z? = [{xfY ^ can be written as 



To minimize J(x(t),u(t)), we need to achieve SJ({x,u); (c,fi)) = 0 for all 
admissible directions. Since i{to) = 0 and i{tf) is fiee, the terminal value of X 
needs to satisfy 



X{tf) = S{tf)x{tf) 



(36) 



We choose X(f) such that it is linearly related to x(t) through 
X{t) = P{t)x(t). Then the boundary condition Eq (36) becomes 



P(tf) = S(tf) 



(37) 



Substituting the assumption X{t) = P(t)x{t) into Eq (33) and (34), we 
obtain a control law as 

u'it) = kix^t) = K(t)x''{t) =-R^ {x'')B'^{t)P(f)x\i) (38) 

where K(f) = — J?~ ' {j^)B^ {t)P{i), with the associated closed-bop system 
as 

x''(t) = [A{t)-B(t)R-\x'')B'^{t)P{t)]x''{t) x''(to) = xo (39) 
where P{t) satisfies 

-P{t)x°{t) = A'^(t)P{t)x"{t) + P(t)A{[)x"{t) 

- P(t)B(t)R- ' (x'')B'^(t)P(t)x''{t) (40) 
+ Q(t)x'\t) + M{x'',u°) 

and P{tf ) = S{tf ). 
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Now we prove that the solution {yf, if) from the necessary condition of 
optimality also satisfies the sufficiency condition using Lemma 1. 

First, it is easy to verify that the solution {x°,u°) in Eg (38) and (39) with 
P(tf) = S(tf) satisfies Eq (11) and (12) in Lemma 1. Now we prove that 
J{x,u) in Theory 1 is strictly convex in u, unformly in x. Considering an 
arbitary fixed x, we set ^(?) = 0, then the variation of J(x,u) in u is 



f'/ 
J Jo 



ou ou 



d{{x,u)\ (0,/i)) 

J J, 

ix)n + k^Bn)dt 
In terms of Eq (41), we have 

(0,m)) - J{x,u) = ( ('-^/(z + [Q,n^] ^ ,i + [0,/(^] ^ ,t) 

J(0 



(41) 



dt + <j>ixitf),tf))-i Vfiz,z, 

+ ^{x(tf),tf)) 

= r(/-(z+M^z+[o,/t^]^,o 

-f(z,z,t))dt 
f'f 

= (L(x,u+n,t)—L(x,u,t) 
J to 



■If \ 

(^{{u + nfR{x){u+^i) 
-u'^R{x)u)+X^Bn)dt 



t)dt 



(42) 



= t^u'Rix, 

Jtn 



-- dJ((x,u); (0,/i)) 



1 

+ 2 



11 R{x)iidt 



'0 



ft !■ Pi T r\ T 

8J((x,u);{m)=\^ ^Y^^+Yx^^'^* 



= ^\{x'm{t) + X'(t)A{t) (43) 

+ M'^(x,u))i{t) - X^(t)t(t))dt 
+ xitffSitfmtf) 

1 d{u'''R{x)u) 

where M{x,u) = ^ . In term of Eq (43), we obtain 

Jiix,u); im)-J(x,u) = i ('^/(z+ [^^,0]^,z+ [t^,0]^,t)dt 

+ ^{x(tf) + i{tf),tf)) 

-( Y f{z,z,t)dt + ^(x(tf),tf)) 
J to 

= Y(f(z+[e,Of,z+[t^,Of,t) 

-fiz,z,t))dt + x(tffSitf)i(tf) 



(L{x + i,u,t) — L{x,u,i) 



(44) 



Jtn 



■it)Qit) + X^(t)A(t) 



>dJ((x,u);iO,iS)) 



JtQ 

+ M^(x,uM(t) - X^(t)k{t))dt 

+x{tffs(tf)atf) 

+ ].Y ^ea^+u'iRix+o 

-R{x))u-lM'^Odt 



Since R(x) > 0, the equality holds when and only when ^{t) = 0. So J{x,u) 
is strictly convex in u, uniformly in x. 

Now we prove that J{x,u) is strictly convex in x, uniformly in u. Setting 
nit) = 0, the variation of J{x,u) in x is written as 



Since the function l{x,u) = u^R{x)u is a strictly convex fiirwtion in x, 
uniformly in u as it is stated in Theorem 1, we have 



u^R{x + 0« - u^R{x)u >lM'^i 



Then 



(45) 



/((x,m); (<?,0)) > bJi(x,uy, (^,0)) 



1 

+ 2 



■V 1 ~ (46) 

e S{tf)i{tf)>bJ{ix,u); ({,0}) 
'0 ^ 
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the equality holds when and only when = 0. So J(x,u) is strictly convex 

in X, uniformly in u. 

From the analysis above, it holds that the u"{t) and x°{t) defined in (19) 
and (20) with the boundary condition (22) satisfy (11) and (12) in Lemma 
1. In addition, the augmented performance index J{x,u) is strictly convex in x, 
u separately, uniformly in the other one. We conclude that x° and u" minimize 
the performance index J in Eq (l) for the jSSLQR problem. So tf{t) defined 
in (19) is the optimal control lawfiir the NSLQR problem. 

From the proof of Theorem 1 and Lemma 1, we obtain a 
corollary for the defined NSLQR problem with a genersd R(x) as 
following. 

Corollary 1 For the optimal problem defined in (1) and (2) with 
R(x) > 0 and free boundaiy values qfx(to) and x(tf), the u°{f) and x''(f) 
defined in (19) and (20) with P(t) satisfying (21) and (22) is either a 
minimum or a saddle for the petformance index J in (1). 

It follows from Eq (42) that / is strictly convex in u, uniformly in 
X if R{x) > 0. It then follows from the proof of Lemma 1 that 
J{x'',u'') cannot be a maximum. Detailed proof is omitted. 

Remark: The corollary signifies that the optimal control law 
(19) gives the minimum cost for a particular x{to) among aU 
possible control laws for a general i^(x)>0, since 14° is the 
minimum point otJ(x,u) with respect to u. However, the cost may 
be lowered if a difierent x{t) trajectory with a different x{to) is 
chosen since (x",?/") can be a saddle of/. So for a given x(fo), the 
control law (19) gives the optimal solution. 

From the analysis, it can be said that, for the NSLQR problem 
with a general i^(x)>0, the optimal solution {xf,if) needs to be 
evaluated for every specific x(?o). Whereas in the classical LQR 
problem or the SDRE problem, the optimal solution can be 
explicitly written as a function of t or x, uniformly for any x(to). 
For the NSLQR to have such uniform solutions, R(x) has to satisfy 
the additional constraint that the function l{x,u) = u^R{x)u is 
stricdy convex in x, uniformly in u. 

Analysis of the Solution for PDRE 

3.1 A Sharpened R(x) 

The PDRE (21) of the NSLQR problem is dififerent from the 
RE in the SDRE control strategy literature in 

1 . There is an additional term M(x,u) in the PDRE (2 1), which is 
derived from the derivative of the state-dependent ^(x) with 
respect to x; 

2. It is a differential RE instead of an algebraic RE; 

3. The system state x(t) appears on both sides of the equation. 

The way to solve the SDRE is not applicable for the PDRE. To 
obtain the optimal solution of the NSLQR, it is necessary to 
investigate the solution of the PDRE (21). In this section, a 
sharpened Rix) is studied as an example. 

As stated in Theorem 1, the function l{x,u) needs to be stricdy 
convex in x, a quadratic function of x seems to be a reasonable 
choice for the matrix R. So consider 



M(x,u) = 



J?(x) = X^RoxIm xm+^l 



(47) 



where Ro is a. nxn symmetric, positive semi-definite constant 
matrix, R\ is a m x m symmetric, positive definite constant matrix 
and /„, X m is the mxm identity matrix. The term Ri guarantees 
that Rix) is invertible. Then 



1 d(u^R(x)u) 

2 8x 

ld(u'^(x'^RoxI^x^ + Ri)u) (48) 



8x 



-- u^uRox 



and the PDRE (21) becomes 

~Pit)x''{[) = A'^(t)P{r)x''ir) + P(t)A(t)x''(t) 
-P(t)B(t)R- ' (x")B^(t)P(t)x''(t) + Qit)x°it) + u"'^u"Rox°{t) 
^^P{t) = A '^(t)P(t) + P(t)A(t) 
- P(t)B(t)R-^ (x")B^{t)P{t) + Q{t) + u'''^u''Ro 
= A^(t)P(t) + P(t)A(t) - P(t)B(t)R- 1 (x")B''(t)P(t) + Q(t) 
+ x''^it)P^it)Bit)R-^ (x'')R- ' ix°)B^it)Pit)x°it)Ro 

which can be denoted as 

- Pif) = A'^(t)Pii) + Pit)A(t) 

- Pif)Bit)R- ' {x'')B'^(t)P{t) + e'(x",/) 



(49) 



(50) 



where Q(x'\t) = Q{t) + x"'^ (t)P'^ {t)B(t)R- ' (x'')R-^ (x°)B'^{t) 
P(t)x"(t)Ro > Q{t) > 0, and x"{t) satisfies the closed-loop system (20). 

Remark: The PDRE (50) is coupled with closed-loop system 
(20). In classical LQR theory, system state x(t) and the solution of 
Riccati Equation P{t) can be obtained through a 2«-dimensional 
Hamiltonian matrix [25] or by decoupling the system plant and 
the Riccati Equation. However, in the NSLQR problem, the 2«- 
dimensional Hamiltonian matrix is not linear any more, and the 
decoupling is not applicable. The PDRE (50) has to be solved 
together with the closed-loop system (20). 

To generalize the results, the PDRE (50) can be rewritten into a 
general form as: 

P{t)+A^{t)P{i) + P{t)A{t) + G(x,t) - P{t) V(x,t)P{t) = 0 (51) 

with a given terminal value P{tf), and x(f) is a continuous single- 
valued function of?. The matrices G{x,t), V(x,t) are positive semi- 
definite, symmetric, and continuous in both arguments. 

For convenience of reference, the PDRE (51) is denoted as 
K(P) = 0 in the sequel. The dependence of the variables on t is 
omitted, for example, x{t„) is denoted as x„. The time argument 
of matrices and vectors are omitted when no misunderstanding is 
introduced. For instance, G„ denotes the value of matrix 

G(^Xm^tin)- 

3.2 A Comparison Theorem for the PDRE 

The propositions and theorem introduced below are derived 
from Proposition 7 and 8 in [26]. In [26], similar results are 
developed for time-dependent Riccati Equations with initial 
values. Now, a Comparison Theorem for the PDRE (51) with a 
terminal value is given. Before presenting the theorem, four 
propositions need to be established first. 

Propositioii 1 Let P(t) be a symmetric solution of the PDRE (51) on 
D=TxX, where T=[tQ,tf\, X = {x{f)\yteT,x{i)eU'']. 
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t 



Figure 2. Trajectories of P{t) and U{t). 

doi:1 0.1 371/joumal.pone.0094925.g002 

(1) If U(t) is a symmetric solution of the inequahty K([/)>0 on 
D such that U{tf)<P(tf), then U(t)<P(t) on D. 

(2) If W{t) is a symmetric solution of the inequality 5J(H^)<0 on 
D such that W(tf)> P(tf), then fF(0>P(0 on D. 

Proofs Since the matrices and vectors in the PDRE (51) are continuous 
in the arguments, P{t), U{t) and W{t) are continuous in t. We start with 
part (1) of the proposition. 

Suppose that U(t)<P{t) does not hold on D, there must be a 
time t„,e[to,tf) by the Mean Value Theorem, such that 

U(t^) = P{t^) (52) 

and 

U{t)<P{t) (53) 
for any te(tf„,tf], as shown in Fig. 2. Let 

^{t) = p^{P{t)-U{t))p te[t„„tf] (54) 
where p is a non-zero constant vector. Then 

.^(0>0 te[l„„lf] (55) 
and the equality holds if and only it t = t^. 

U(t„) = P(t„) (58) 



= p'^P(t,„)p-p^U{tn,)p 

(56) 

= P^K(P„,)p-p''5?([/m)p 
= 0-p^5J([/,„)p<0 

Therefore, (j){t)<0 in a right neighborhood of which 
contradicts Eq (55). So U{t)<P(t) holds on D. 
Part (2) can be proven similarly. 

Proposition 2 Let P{t) be a symmetric solution of the PDRE (51) on 
D=TxX, where T =[ta,tf\, X = {x{t)\iteT,x(t)eU"}. 

( 1 ) If (7(?) is a symmetric solution of the inequality K( (7) > 0 on 
D such that U{tf)<P{tf), then U{i)<P{t) on D. 

(2) If W(t) is a symmetric solution of the inequality 5J(IF)<0 on 
D such that W{tf)>P(tf), then W(t)>P{t) on Z). 

Proof 4 T^e inequality part of the proposition has been proven in 
Proposition 1. The following part is the proof for the case in which 
U(tf) = P{tf). We start with part (1). 

As it is discussed in Proof 3, Pit), U(t) and W{t) are continuous 
in /. Assume that U{f)<P(t) does not hold on D when 
U{tf) = P(tf), there must be a period [?„,,?„]e[/o,?/], such that 



U(t)>P(t) (57) 

for any te[tm,tn), and 
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Let 



Proofs Let Po{t) = P2it)-P\(.t), then Po{tf)>0. 



i,{t) = p''{P{t)-U(t))p te[t„,tn] 



(59) 



where p is a non-zero constant vector. Then 

<^(0<0 te[t„„tn\ (60) 

and the equality holds if and only it t = t„. 

ktn) = p''{P{tn)-U{tn))p 



= p'^P{tn)p-p'^ij{tn)p 

= p''U(P„)p-p^^iU„)p 
= 0-p^5ft(f/„)p<0 



(61) 



Therefore, <j>{t)>0 in a left neighborhood of which contradicts 
Eq (60). So ult)<Pit) holds on D. 
Part (2) can be proven similarly. 

Proposition 3 Let P{t) be a symmetric solution of the PDRE (51) on 
D=TxX, where T=[tQ,tf\, X = {x{t)\itsT,x{t)eW}. 

(1) If {/(?) is a symmetric solution of the inequality 5R({7)>0 on 
D such that U(tf)<P{tf), then U(t)<P(t) on D. 

(2) If W( t) is a symmetric solution of the inequality 5R( W) < 0 on 
D such that W{tf)>P{tf), then W{t)>P{i) on D. 

Proposition 4 Let P(t) he a symmetric solution of the PDRE (51) on 
D=TxX, where T =[to,tf\, X = {x{t)\MteT,x{t)eU"}. 

(1) If U{t) is a symmetric solution of the inequality Sft({7)>0 on 
D such that U{tf)<P{tf), then U(t)<P(t) on D. 

(2) If lV(t) is a symmetric solution of the inequality 5J(IF)<0 on 
D such that > then W{t)>P(t) on £». 

The proofs of these two propositions are similar to those of 
Propositions 1 and 2. 

These four propositions give a boundar)- estimation for the 
solution of the PDRE. Based on the four propositions discussed 
above, a Comparison Theorem is introduced for the PDRE (51). 

Theorem 2 For ie{l,2}, kt Piif) be the solution of the PDRE 

- Pi{t) = Aj{t)Pi{t) + Pimm + Gi{x,t) - Pi{t) Vi{x,t)Pi{t) (62) 

on D=TxX, where T=[to,tf\, X = {x{t)\i teT ,x{t)sU"} . Lf 
P\{tf)<P2{tf), and 



Aiit) - V2{x,t) 
Then Pi(t)<P2{t) onD. 



Gi(x,t) A{(t) 
Ai{t) - Vi(x,t) 



(t,x)eD (63) 



Po(0 = P2(0-A(0 

= -^[(0^2(0-^2(0^2(0- G2(X,0 

+ P2(0 V2{x,t)P2{t) - { -A{(t)Pi (0 - Pi (0^1 (0 

-G,(x,t) + Pi{i)V,(x,t)Pi(t)} 

= iP2-PdV2(P2-Pl) + PlV2(P2-Pl) 

+ (P2-Pl)V2Pl+Pl(V2-Vl)Pl (64) 

-A^(P2-Pl)-(P2-Pl)A2-iA2-AifPl 

-Pi(A2-Ai)-(G2-Gi) 
= -A^Po-PoA-[I„^„,Pi] 

G2-G1 A^-Af /„xn 

A2-A1 -V2-i-Vi) Pi 



where I„xn is the nxn identity matrix and A = A2 — VjPi — - KiPo- 
Since 



[Inxn,Pl 



A2 Aj 



G2 — G1 
_A2-Ai -V2-(-Vi) 

for the PDRE K'(Po) defined as 



Pi 



>0 (65) 



Po{x,t) = 

-A^Po-PoA-[l,Pi] 



G2-G1 Al-Al 
A2-A1 -V2-i-Vi) 









Pi. 



(66) 



we have 5?'(Po) = 0, 5J'(0)>0, Po(//')>0. It is readily concluded that 
Po(t) = P2(t)-Pi(t)>0 for all {t,x)eD by Proposition 4. So Pi(0 
<P2(0 onD. 

3.3 Application of Comparison Theorem 

For the PDRE (50), it is readily verified that Q{x",t) = 
Q{t) + u'>'^{t)if{t)Ro>Q{t) and B{t)R-\x")B'^{t)>(). Then we 
have 



A{t) 

Qit) 

A(t) - 



A^(t) 
-B{t)R-\x")B'^(t) 
A^t) 
-B{t)R-^(xf)B'^{t) 



(67) 



{t,x)eD 



and 



Ait) 0 



e'(x°,o 

A(t) 



A^it) 
-B{t)R-\x")B'^(t) 



(68) 



(t,x)eD 



With Theorem 2 in Section 3.2, it follows that the solution P(0 of 
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Table 1. The Values of Parameters in Sinnulation Case 01. 




A'o Ay 


Po 


Pf 


u(t) 


P(t) 


J, 


Optimal Sol. 1 0.413 


0.5 


3.7^-15 


«°(0 


PDRE() 


0.237653 


Riccati Per. 1 0.41 7 


0.5 


3.3f-08 




Diff.RE* 


0.237672 


Control Per. 1 0.421 


0.5 


3.7^-15 


«»(0 + 0.01 


PDREQ 


0.237695 



♦Diff. RE: -P(0 = A^(t)P(t) + P(t)A(t) - P{t)B{t)R-' (t)B'^(t)P(i) + Q(t). 
doi:l 0.1 371 /journal.pone.0094925.t001 



the PDRE (50) satisfies 

Px{t)<P{t)<P2{t) 
where P\(t) is the solution of 

-Px(t) = A''(t)P^(t) + P,{i)A(t) 

- Pi {t)B{t)R- 1 (x)5^(0Pi (0 + 2(0 
Pdtf) = S(tf) 

and Piit) is the solution of 

- Plit) = ^^(0P2(0 + P2{t)A(t) + Q!{x,t) 

P2(tf) = S(tf) 



(69) 



(70) 



(71) 



These two equations are differential REs, which are similar to the 
algebraic RE in the SDRE control strategy. The analysis shows 
that the solution of a PDRE can be estimated by the solutions of 
two differential REs. Thus the methods of solving a differential RE 
can be borrowed to facilitate the solution to a PDRE, such as 
determining the initial value for P(i). 

Simulation 

In this section, the NSLQR problem studied above is applied to 
two specific simulation cases to first verify the optimality 
numerically and then to model the goal pursuit process introduced 
in the section Introduction, so that the three goal pursuit 
behaviors, GGB, SMB and CCB are reproduced for further 
studying. 

The NSLQR problem is, technically, a Two-Point Boundary 
Value (TPBV) Problem, since the initial value of the system plant 
(20) Xa and the terminal value of the PDRE (21) Pf are known. A 
shooting method is employed in the simulation with sacent 
iteration to solve this TPBV problem [2 7] . The convergence of the 
method is slightly slower than being second-order with the chosen 
R(x), as it is discussed in [28] and [29]. The solver of "ode4 
Runge-Kutta" is chosen in the simulation with a fixed step of 0.01 
second. The simulation error threshold is set as 10^^. 

4.1 Numerical Verification of the NSLQR Optimality 

4.1.1 Simulation Model. In this section, we consider a 
specific optimal problem of seeking a control law u(t) to minimize 
the given performance index 



■'/■ = ' 1 

{\-x'{t) + A-x^{t)-u^(t))dt+--0-x^(tf) (72) 

'0=0 ^ 



x(t)-- 



-1 

TTT 



x(t)+\-u(t) 



x{to)=\ 



(73) 



with A - 



B=l, 2=1, S(tf) = 0 and R = 4x^ir). 



along the first-order LTV system as 



-I 

7+T' 

Three forms of control law are considered. The first one is the 
optimal control law i<°(t) defined in Eq (19) with the P{t) from the 
PDRE (21). The second control law is the same u''(t) in Eq (19) but 
with the P{t) from a standard Differential Riccati Equation, which 
has no additional term of M{x,u). The third control law is the 
same ii°(t) in Eq (19) with a perturbation of 0.01. To differentiate 
with each other, the three control laws are named as "Optimal 
Solution", ''Riccati Perturbation" and "Control Perturbation", 
respectively. Table 1 summarizes the parameter details used in this 
simulation case as well as the values of performance index Ji . Fig. 3 
gives the system behaviors with the three different control laws. 

4.1.2 Discussion of Results. From Fig. 3, it is valid to say 
that the three control laws, with the same function of R{x), all 
bring the system state a stable behavior. However, the optimal 
control law u°{t) supplies the minimal performance index Ji, as 
shown in Table 1. Even though it cannot be said with great 
confidence that the control law u°{t) is optimal since it is difficult 
to verify infinite numerical examples. It is still proved that the 
classical LQR solution, which is the "Riccati Perturbation" case, 
provides a greater value of Ji than the optimal control law i/'(t) 
does, so the classical LQR theory is not applicable to the NSLQR 
problem anymore. Moreover, the NSLQR theory does provide 
the minimal value of Ji among all three control laws, which 
verifies the optimality of the NSLQR theory at some degree. 

4.2 Application of the NSLQR to Goal Pursuit Processes 

4.2.1 Modeling Goal Pursuit Behaviors. From a psycho- 
logical perspective, the system state x{t) in the NSLQ_R problem 
represents the goal discrepancy. The parameter A(t) in system model 
(2) represents the goal attraction. For a constant A, all the 
eigenvalues having negative real parts (asymptotically stable) 
means the goal is attractive; if the eigenvalues are with non- 
positive real parts and those eigenvalues with zero real parts are 
simple (marginal stable), then the goal is neutral; otherwise 
(unstable), the goal is repulsive. Similar interpretations apply to 
time-varying A{t), where the asymptotic stability, marginal 
stability and instability can be interpreted as attractive, neutral 
and repulsive goals. The input u(t) represents the level of control 
effort, while the parameter B(t) is treated as control effectiveness. In the 
performance index J , the weighting coefficient 2(0 functions as 
goal discrepancy penalty. A greater value of ||2II results in less 
discrepancy; and the weighting coefficient S(tf) is known as 
terminal penalty. A greater value of \\S\\ leads to smaller terminal 
goal discrepancy; the weighting coefficient R(x) is control energy 
penalty, which depends on goal discrepancy. A greater value of | |P| | 
means less control energy expenditure is allowed. 
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(a) Control Energy Penalty R (b) Goal Discrepancy x 




Figure 3. System Behavior of Simulation Case 01: a) Control Energy Penalty R vs. Goal Discrepancy x; b) Goal Discrepancy x. 

doi:1 0.1 371 /journal.pone.0094925.g003 



4.2.2 Simulation Model. As an initial study, we consider a 
first-order linear goal attainment process in the simulation case. 
The process is set as 



jc{l) = Q-x(r)+l-u{t) 



x{to)=\ 



(74) 



which is a neutrally attractive goal process. The parameter R{x) is 
the concern in the current simulation case study. Based on the 
analysis above, we hypothesize that the GGB can be produced by 
R{x) that is a monotonically increasing fianction of ||a:||; the SMB 
can be produced by R(x) that is a hump fianction of ||-x:||; and the 
CCB can be produced by a constant R. Table 2 gives the value of 
the parameters used in the simulation. The simulation results are 
presented in Table 3 and Fig. 4. From Table 2, it can be seen that 
the choice of R(x) in the GGB satisfies the convexity constraint of 
the function l{x,u), so the solution is optimal for the GGB. Even 
though the convexity constraint is not satisfied in the SMB, from 
Corollary 1, it still can be said that the solution is optimal since 
x(/o) is fixed. For a meaningful comparison, the parameters are 
adjusted such that three behaviors achieve roughly equal terminal 
values, as the values of x,/ shown in Table 3. 

From Fig. 4 Part d), it can be seen that, the NSLQR method has 
successfully modeled the goal pursuit process with three behaviors. 
Part a) and Part b) show the control energy penalty matrix R vs 
time t and goal discrepancy x, respectively. The hypothesis is 
validated that when _R is a monotonically increasing function of 



Table 2. The Values of Parameters in Simulation Case 02. 



||x||, the goal pursuit process can exhibit the GGB; when R is a 
hump function of ||x||, the goal pursuit process can exhibit the 
SMB; and when _R is a constant, the goal pursuit process can 
exhibit the CCB. Part e) and Part f) present the control effort u vs 
time t and goal discrepancy x, respectively. For the CCB, the 
control effort increases as it approaches the goal; for the GGB, the 
control effort decreases with goal discrepancy; and for the SMB, 
the control effort is higher at two ends than in the middle. 

Table 3 fists some selected norms of the goal discrepancy x(t), 
and the control effort u{t). It shows that, with the same initial value 
Xq and terminal value Xf, the CCB features the least accumulated 
error | 1 1 , but consumes the most control energy | |m| I2 and suffers 
from the highest stress level llz^Ho^. The SMB consumes the least 
control energy and sufferers from the lowest stress level at the price 
of a higher accumulated error. The CCB is in between of these 
two. 

4.2.3 Discussion of Results. Based on the simulation results 
above, it is concluded that in pursuing a goal with a finite terminal 
time (deadline), the GGB and the SMB behaviors may save 
control energy and reduce stress level over the CCB. However, the 
CCB has the least accumulated error. So the GGB and SMB may 
be beneficial in applications where only the level of goal 
attainment at the terminal time is of concern, such as a deadline 
beating process. However, the GGB or SMB would not a 
preferred choice when the goal needs to be maintained over long 
time or needs to be approached smoothly. 









Par am. 


GGB 


SMB 




CCB 


Goal Discrepancy Penalty 


Q 


1 


1 




23.5 


Terminal Penalty 


s 


10 


10 




0 


Control Energy Penalty 


R 


3.7(.Y^ + 0.1) 


2.2(0.5 -(a 


-0.5)') 


2 


Simulation Error Threshold 


E 


0.000001 


0.000001 




0.000001 


Initial Value of x(t) 


xo 


1 


1 




1 


Initial Value of P{t) 


Po 


2.831 


0.903 




6.841 
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0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 8 0.9 1 



Tlme(sec) x 

e) f) 

Figure 4. Three Goal Pursuit Beiiaviors: a) Control Energy Penalty R vs. Time; b) Control Energy Penalty R vs. Goal Discrepancy x; c) 
Feedback Gain K; d) Goal Discrepancy x; e) Control Effort u vs. Time; f) Control Effort u vs. Goal Discrepancy x. 

doi:10.1371/journal.pone.0094925.g004 



PLOS ONE I www.plosone.org 



12 



April 2014 I Volume 9 | Issue 4 | e94925 



The Optimal Solution of a NSLQR Problem and Its Applications 



Table 3. Simulation Results of Three Goal Pursuit Behaviors. 







Par am. 


GGB 


SMB 


CCB 


Termifzal Value of x{l) 


■V 


0.064 


0.064 


0.064 


Terminal Value of P{t) 


Pf 


1 0.000 


1 0.000 


0.000 


Accumulated Error 


WAx 


59.847 


49.183 


29.680 


Control Action 


ll"lli 


96.937 


95.770 


95.180 


Control Energy 


Il«ll2 


9.977 


9.730 


13.207 


Stress Level 


ll«ll» 


1.651 


1.643 


3.421 



doi:10.1371/journal.pone.0094925.t003 



Conclusion and Future Work 

In this paper, a necessary and sufficient Optimality Theorem 
with rigorous mathematical proof for the NSLQR problem with a 
convexity constraint on R{x) is presented. It is also argued that for 
given x(/o), the NSLQR gives an optimal solution. A Comparison 
Theorem for the solution of the PDRE with a general form for the 
NSLQR problem is presented as well. In the simulation, the 
NSLQR is first apphed to a first-order LTV system to verify the 
proposed theory. The NSLQR is then used to model two 
psychological behaviors (GGB and SMB) in goal pursuit processes 
identified from psychology, along with the typical behavior of 
engineering control systems (CCB) by employing different control 
energy weighting R{x). The simulation results show that the 
NSLQR modeling method can reproduce the three goal pursuit 
behaviors and the psychological goal pursuit behaviors can be 
more beneficial than the CCB in terms of energy saving and stress 
reduction in applications where only the goal discrepancy at the 
terminal time is of concern, such as in Marathon race, animal 
stalking, beating a deadline or hitting a target. 

In this paper only some scalar cases of the goal pursuit process 
are studied; studies of the multi-variable cases are the next steps of 
our work. In the current study, the parameter R(x) is selected to 
reproduce the goal pursuit behaviors. Similar results should be 
achievable with a state-dependent goal discrepancy weighting 



Q(x)^ which would be more akin to intuitive psychological 
tendency to employing the GGB and SMB strategies in terminal 
goal pursuit processes; whereas the control weighting modeling is 
more akin to conscious choice of the GGB and SMB for its energy 
saving and stress reduction benefits. Since for the NSLQR 
problem the PDRE has to be solved simultaneously with the 
closed-loop system, it is a TPBV problem. An inherent difficulty in 
this TPBV problem is how to determine the initial value of P{i). 
An Approaching-Horizon algorithm based on a shooting method 
is developed to address this problem, which will be presented in 
detail in a separate paper. 
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